Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
نویسندگان
چکیده
A method was developed to adapt prosody to a new speaker/style in speech synthesis. It is based on predicting differences between target and original speakers/styles and applying them to the original one. Differences in fundamental frequency (F0) contours are represented in the framework of the generation process model; differences in the command magnitudes/amplitudes. While the original one requires a certain amount of training corpus, while corpus for training command differences can be small. Furthermore, in the case of style adaptation, it is not necessarily the corpus being uttered by the same speaker of the original style. Speech synthesis was conducted using HMM-based speech synthesis system, where prosody was controlled by the method. Listening experiments on synthetic speech with style adaptation and voice conversion both showed the validity of the method.
منابع مشابه
Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis
Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a mo...
متن کاملFundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process mo...
متن کاملAnalytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki’s Model
Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplishe...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملCorpus-based synthesis of fundamental frequency contours with various speaking styles from text using F0 contour generation process model
A corpus-based method of generating fundamental frequency (F0) contours of various speaking styles from text was developed. Instead of directly predicting F0 values, the method predicts command values of the F0 contour generation process model. Because of the model constraint, the resulting F0 contour keeps certain naturalness even when the prediction is done incorrectly. The method includes a ...
متن کامل